In statistics, a simple random sample is a subset of individuals (a sample) chosen from a larger set (a population). Each individual is chosen randomly and entirely by chance, such that each individual has the same probability of being chosen at any stage during the sampling process, and each subset of k individuals has the same probability of being chosen for the sample as any other subset of k individuals[1]. This process and technique is known as simple random sampling, and should not be confused with Random Sampling.
In small populations and often in large ones, such sampling is typically done "without replacement" ('SRSWOR'), i.e., one deliberately avoids choosing any member of the population more than once. Although simple random sampling can be conducted with replacement instead, this is less common and would normally be described more fully as simple random sampling with replacement ('SRSWR'). Sampling done without replacement is no longer independent, but still satisfies exchangeability, hence many results still hold. Further, for a small sample from a large population, sampling without replacement is approximately the same as sampling with replacement, since the odds of choosing the same individual twice is low.
An unbiased random selection of individuals is important so that in the long run, the sample represents the population. However, this does not guarantee that a particular sample is a perfect representation of the population. Simple random sampling merely allows one to draw externally valid conclusions about the entire population based on the sample.
Conceptually, simple random sampling is the simplest of the probability sampling techniques. It requires a complete sampling frame, which may not be available or feasible to construct for large populations. Even if a complete frame is available, more efficient approaches may be possible if other useful information is available about the units in the population.
Advantages are that it is free of classification error, and it requires minimum advance knowledge of the population other than the frame. Its simplicity also makes it relatively easy to interpret data collected via SRS. For these reasons, simple random sampling best suits situations where not much information is available about the population and data collection can be efficiently conducted on randomly distributed items, or where the cost of sampling is small enough to make efficiency less important than simplicity. If these conditions are not true, stratified sampling or cluster sampling may be a better choice.
Contents |
In a simple random sample, one person must take a random sample from a population, and not have any order in which one chooses the specific individual.
Let us assume you had a school with 1000 students, divided equally into boys and girls, and you wanted to select 100 of them for further study. You might put all their names in a bucket and then pull 100 names out. Not only does each person have an equal chance of being selected, we can also easily calculate the probability of a given person being chosen, since we know the sample size (n) and the population (N) and it becomes a simple matter of division:
n/N or 100/1000 = 0.10 (10%)
This means that every student in the school has a 10% or 1 in 10 chance of being selected using this method. Further, all combinations of 100 students have the same probability of selection.
If a systematic pattern is introduced into random sampling, it is referred to as "systematic (random) sampling". For instance, if the students in our school had numbers attached to their names ranging from 0001 to 1000, and we chose a random starting point, e.g. 0533, and then pick every 10th name thereafter to give us our sample of 100 (starting over with 0003 after reaching 0993). In this sense, this technique is similar to cluster sampling, since the choice of the first unit will determine the remainder. This is no longer simple random sampling, because some combinations of 100 students have a larger selection probability than others - for instance, {3, 13, 23, ..., 993} has a 1/10 chance of selection, while {1, 2, 3, ..., 100} cannot be selected under this method.
If the members of the population come in two kinds, say "red" and "black", one can consider the distribution of the number of red elements in a sample of a given size. That distribution depends on the numbers of red and black elements in the full population. For a simple random sample with replacement, the distribution is a binomial distribution. For a simple random sample without replacement, one obtains a hypergeometric distribution.